Mikko Tolonen, Leo Lahti
June 11, 2015
Emphasis on research process
Transparency (data, methods, reporting)
Reproducibility
Openness (unlimited access and reuse)
New modes of collaboration and initiatives
Access to data is an institutional question. Using and tidying up the data is a research question
Automation vs. point-n-click ?
Hierarchical information, only some fields relevant for our study
Load the data and tools in R:
load("df.RData")
library(bibliographica)
kable(t(df.orig[22495, ]))| 22495 | |
|---|---|
| 008..partial-Language | English |
| 100.a-Author | Gauden, John, |
| 100.d-Author, dates | 1605-1662, |
| 100.d..partial-Author, birth | 1605 |
| 100.d..partial.1-Author, death | 1662 |
| 240.n-Part/section of a work | NA |
| 245.a-Title | Eikōn basilikē |
| 260.a-Place of publication | [London] : |
| 260.b-Publisher | Reprinted in Regis memoriam, for John Williams, |
| 260.b..partial-Printed for | John Williams |
| 260.c-Publication date | 1649. |
| 260.c..partial-Publication date, clean | 1649 |
| 300.a-Extent | [8], 175, [9] p., [2] leaves of plates : |
| 300.c-Dimensions | 10 cm (12⁰) |
| 650.a-Subject | NA |
| 650.y.651.y-Chronological subdivision | Civil War, 1642-1649;Civil War, 1642-1649. |
| 650.z.651.a.651.z-Geographic name and subdivision | Great Britain |
| 65..series-Additional years | ;;;;1642;1649 |
| -NA | NA |
Raw page counts
rawpages <- as.character(unique(df.orig[sample(nrow(df.orig), 6), "300.a-Extent"]))
kable(rawpages)| 12p. ; |
| [24], 192 p., [2] leaves of plates (1 folded) : |
| 7, [1] p. ; |
| 1 sheet ; |
| 1 sheet ([1]) p. ; |
| [8] p. ; |
Polish page counts
polish_pages(rawpages)$total.pages## [1] 12 220 8 2 2 8
kable(as.character(sample(unique(df.orig[, "300.c-Dimensions"]), 6)))| 50 x 40 cm. |
| 49-50 cm. (2⁰) |
| 24 x 20 cm. |
| 50 cm (2⁰) |
| 20 cm. (8⁰) |
| 53 x 33 cm. |
Pick dimension information
kable(polish_dimensions("10 cm (12⁰)"))| original | gatherings | width | height |
|---|---|---|---|
| 10 cm (12⁰) | 12to | NA | 10 |
Estimate missing dimensions
kable(polish_dimensions("10 cm (12⁰)", fill = TRUE))| original | gatherings | width | height | area |
|---|---|---|---|---|
| 10 cm (12⁰) | 12to | 10 | 15 | 150 |
Many versions of London:
x <- as.character(df.orig[, "260.a-Place of publication"])
top_plot(x[grep("London", x)], ntop = 20)In total 374 unique places with the string London - tidying up and synonyme lists !
Enriching data by external information
as.matrix(get_gender(polish_author(sample(unique(df$author.name), 20))$first)$gender)## [,1]
## samuel "male"
## richard "male"
## william "male"
## gamaliel "male"
## robert "male"
## charles "male"
## john "male"
## prudencio "male"
## thomas "male"
## hyde "male"
## thomas "male"
## mary "female"
## thomas "male"
## zachary "male"
## henry "male"
## thomas "male"
## william "male"
## robert "male"
## n NA
## charles "male"
Authors (number of titles / paper use / life years)
Times 1470 - 1800 ?
Places: London, Ireland, Scotland, North America.. ?
Language ?
Gender ?
top_plot(df, "author.unique", 20)Document count vs. paper for top authors
ggplot(df2, aes(x = docs, y = paper)) + geom_text(aes(label = author.unique), size = 4)Gender distribution for authors over time. Note that the name-gender mappings change over time. This has not been taken into account yet.
##
## female male
## 0.026 0.974
Publishing times 1470 - 1800 ?
Places: London, Ireland, Scotland, North America.. ?
Language ?
df2 <- df %>% filter(publication.place == "London")
df2 <- df %>% filter(language == "French")
df2 <- df %>% filter(publication.year >= 1700 & publication.year < 1800)
top_plot(df2, "author.unique", 10)top_plot(df, "publication.place", 10)df2 <- df %>% filter(publication.country %in% c("France", "Germany")) %>%
group_by(publication.decade, publication.country) %>%
summarize(paper = sum(paper.consumption.km2, na.rm = TRUE), docs = n())
p <- ggplot(df2, aes(x = publication.decade, y = docs, color = publication.country)) +
geom_point() + geom_smooth()
print(p) | publication.place | paper | docs |
|---|---|---|
| London | 97.7011694 | 34928 |
| Dublin | 9.4571200 | 3293 |
| Edinburgh | 5.8437808 | 2444 |
| Philadelphia Pa | 1.5639500 | 1298 |
| Boston | 0.7236274 | 1098 |
| Oxford | 3.1651998 | 920 |
| New York N.Y | 0.7263559 | 724 |
| unknown | 0.2702561 | 489 |
| Paris | 1.6005532 | 274 |
| Glasgow | 0.9460195 | 257 |
| York | 0.4156092 | 203 |
| Cambridge | 0.8931177 | 179 |
| Providence R.I | 0.0243489 | 164 |
| Amsterdam | 0.4696873 | 160 |
| Hartford Ct | 0.1189740 | 145 |
| Bristol | 0.1602805 | 96 |
| Newcastle | 0.3798063 | 93 |
| Norwich | 0.1631005 | 93 |
| Aberdeen | 0.2579055 | 92 |
| Boston Ma | 0.1555083 | 86 |
| Cork | 0.1219026 | 86 |
| Watertown Ma | 0.0059476 | 86 |
| Charleston S.C | 0.1172916 | 80 |
| Newport R.I | 0.0259558 | 76 |
| The Hague | 0.2917593 | 74 |
| New London Ct | 0.0797421 | 69 |
| Baltimore Md | 0.0811340 | 66 |
| Salem Ma | 0.0205162 | 65 |
| Exeter | 0.4385841 | 63 |
| Lancaster Pa | 0.0153348 | 61 |
| Bath | 0.1683589 | 58 |
| United States | 0.0074780 | 56 |
| Williamsburg Va | 0.0249846 | 54 |
| Annapolis Md | 0.0284622 | 53 |
| Norwich Ct | 0.0205556 | 51 |
| Birmingham | 0.1860875 | 49 |
| Shrewsbury | 0.0425996 | 45 |
| Manchester | 0.1607316 | 44 |
| Cambridge Ma | 0.0512204 | 38 |
| New Haven Ct | 0.0179218 | 38 |
| Portsmouth N.H | 0.0121432 | 38 |
| Salisbury | 0.0755964 | 38 |
| Litchfield Ct | 0.0140826 | 34 |
| Albany N.Y | 0.0217126 | 33 |
| Nottingham | 0.0350965 | 32 |
| Basel | 0.4876664 | 31 |
| Exeter N.H | 0.0053984 | 31 |
| Calcutta | 0.5887621 | 30 |
| Coventry | 0.0159278 | 30 |
| Quebec | 0.0236815 | 30 |
| Antwerp | 0.1074526 | 29 |
| Belfast | 0.0249338 | 29 |
| Newburyport Ma | 0.0193370 | 27 |
| Canterbury | 0.1495364 | 26 |
| Kilkenny | 0.0139454 | 25 |
| Liverpool | 0.0861505 | 25 |
| Worcester | 0.0943631 | 25 |
| Chester | 0.0356998 | 23 |
| Richmond | 0.0141920 | 23 |
| Fishkill N.Y | 0.0025932 | 22 |
| Perth | 0.1905675 | 22 |
| Worcester Ma | 0.0263652 | 22 |
| Burlington N.J | 0.0726256 | 20 |
| Waterford | 0.0213785 | 19 |
| Middelburg | 0.0265354 | 18 |
| New Haven Ma | 0.0073169 | 18 |
| Rotterdam | 0.0405524 | 18 |
| Whitehaven | 0.0551667 | 18 |
| Hamburg | 0.0773574 | 17 |
| Poughkeepsie N.Y | 0.0051354 | 17 |
| Hull | 0.0933926 | 16 |
| Kingston | 0.0209270 | 16 |
| New Orleans La | 0.0026800 | 16 |
| New Bern N.C | 0.0056072 | 15 |
| Savannah Ga | 0.0025012 | 15 |
| Sherborne | 0.0586118 | 15 |
| Wilmington De | 0.0082981 | 15 |
| York Pa | 0.0010742 | 15 |
| Halifax | 0.0285274 | 14 |
| Leeds | 0.0176414 | 14 |
| Limerick | 0.0171982 | 14 |
| Sheffield | 0.0121611 | 14 |
| St. Omer | 0.0730371 | 14 |
| Darlington | 0.0235772 | 13 |
| Eton | 0.0246385 | 13 |
| Ipswich | 0.0351292 | 13 |
| Rochester | 0.0086605 | 13 |
| Trenton | 0.0027834 | 13 |
| Germantown Pa | 0.0271030 | 12 |
| Reading | 0.0309643 | 12 |
| Delft | 0.0272516 | 11 |
| Leicester | 0.0677198 | 11 |
| Leiden | 0.0265161 | 11 |
| Paisley | 0.0972585 | 11 |
| Woodbridge N.J | 0.0263863 | 11 |
| Bridgetown Barbados | 0.0090664 | 10 |
| Colchester | 0.0212154 | 10 |
| King’s Lynn | 0.0498319 | 10 |
| Danvers Ma | 0.0036897 | 9 |
| Derby | 0.0453600 | 9 |
| Basseterre Saint Kitts | 0.0065792 | 8 |
| Bury St. Edmunds | 0.1508538 | 8 |
| Concord N.H | 0.0135998 | 8 |
| Frankfurt | 0.0257214 | 8 |
| Winchester | 0.0068340 | 8 |
| Bennington Vt | 0.0089062 | 7 |
| Carlisle | 0.0245411 | 7 |
| Carmarthen | 0.0301766 | 7 |
| Dundee | 0.0083906 | 7 |
| Halifax N.S | 0.0045989 | 7 |
| Newark N.J | 0.0005083 | 7 |
| Norfolk Va | 0.0027697 | 7 |
| Northampton | 0.0014144 | 7 |
| Southampton | 0.0080863 | 7 |
| Stamford | 0.0178891 | 7 |
| Bolton | 0.0236569 | 6 |
| Carlisle Pa | 0.0067984 | 6 |
| Chelmsford Ma | 0.0003514 | 6 |
| Douai | 0.0617950 | 6 |
| Hudson N.Y | 0.0056962 | 6 |
| Nassau | 0.0024300 | 6 |
| New Brunswick N.J | 0.0064943 | 6 |
| Newark | 0.1561994 | 6 |
| Plymouth | 0.0025221 | 6 |
| Preston | 0.0045244 | 6 |
| Stockbridge Ma | 0.0024567 | 6 |
| Warrington | 0.0300346 | 6 |
| Winchester Va | 0.0047004 | 6 |
| Augusta Ga | 0.0014813 | 5 |
| Berlin | 0.0090402 | 5 |
| Chelmsford | 0.0362609 | 5 |
| Cologne | 0.0065898 | 5 |
| Doncaster | 0.0041142 | 5 |
| Geneva | 0.0249878 | 5 |
| Glocester | 0.0434457 | 5 |
| Hanover N.H | 0.0066176 | 5 |
| Hereford | 0.0046653 | 5 |
| Kingston Jamaica | 0.0006026 | 5 |
| Knoxville Tn | 0.0016772 | 5 |
| Londonderry | 0.0115085 | 5 |
| Madras India | 0.0040056 | 5 |
| Montreal | 0.0020798 | 5 |
| Richmond Va | 0.0013440 | 5 |
| Rouen | 0.0809901 | 5 |
| St. John’s Antiqua | 0.0038426 | 5 |
| Trenton N.J | 0.0179238 | 5 |
| Tunbridge Wells | 0.0015591 | 5 |
| Utrecht | 0.0148028 | 5 |
| Venice | 0.0197678 | 5 |
| Wesel | 0.0023959 | 5 |
| Amherst N.H | 0.0027788 | 4 |
| Ayr | 0.0131100 | 4 |
| Brussels | 0.0041844 | 4 |
| Chambersburg Pa | 0.0052640 | 4 |
| Cirencester | 0.0104671 | 4 |
| Dort | 0.0035728 | 4 |
| Durham | 0.0036127 | 4 |
| Halifax N.C | 0.0011423 | 4 |
| Hanover | 0.0737813 | 4 |
| Kelso | 0.0024082 | 4 |
| Kingston N.Y | 0.0000782 | 4 |
| Lansingburgh N.Y | 0.0052544 | 4 |
| Leominster Ma | 0.0020976 | 4 |
| Lexington Ky | 0.0034146 | 4 |
| Louvain | 0.0092770 | 4 |
| Newburgh N.Y | 0.0045238 | 4 |
| Portland Me | 0.0037987 | 4 |
| Roseau Dominica | 0.0033016 | 4 |
| Rutland Vt | 0.0118092 | 4 |
| St. Andrews | 0.0087112 | 4 |
| St. George’s Grenada | 0.0024300 | 4 |
| Stirling | 0.0025859 | 4 |
| Walpole N.H | 0.0155485 | 4 |
| Windham Ct | 0.0035844 | 4 |
| Banbury | 0.0013403 | 3 |
| Bern | 0.0162279 | 3 |
| Bishopstone | 0.0204759 | 3 |
| Bury | 0.0021867 | 3 |
| Chesterfield | 0.0034052 | 3 |
| Clonmel | 0.0024480 | 3 |
| Copenhagen | 0.0091520 | 3 |
| Emden | 0.0015067 | 3 |
| Ephrata Pa | 0.0012844 | 3 |
| Falkirk | 0.0031675 | 3 |
| Gainsborough | 0.0157019 | 3 |
| Galway | 0.0014490 | 3 |
| Gouda | 0.0064688 | 3 |
| Hagerstown Md | 0.0017366 | 3 |
| Hertford | 0.0015831 | 3 |
| Jacksonburgh S.C | 0.0003312 | 3 |
| Keene N.H | 0.0011445 | 3 |
| Leipzig | 0.0151905 | 3 |
| Lyon | 0.0075000 | 3 |
| Macclesfield | 0.0114114 | 3 |
| Maidstone | 0.0014928 | 3 |
| Pointe-A-Pitre | 0.0007392 | 3 |
| St. Augustine Fl | 0.0001966 | 3 |
| St. Pierre Martinique | 0.0018900 | 3 |
| Strasbourg | 0.0013338 | 3 |
| Sunderland | 0.0164409 | 3 |
| Taunton | 0.0105892 | 3 |
| Tewkesbury | 0.0065332 | 3 |
| Twickenham | 0.0195888 | 3 |
| Warren R.I | 0.0024895 | 3 |
| Westminster Vt | 0.0016556 | 3 |
| Windsor Vt | 0.0009867 | 3 |
| Wolverhampton | 0.0012776 | 3 |
| Alexandria Va | 0.0003148 | 2 |
| America | 0.0003452 | 2 |
| Andover Ma | 0.0007632 | 2 |
| Bombay | 0.0021685 | 2 |
| Caen | 0.0084756 | 2 |
| Charlottetown Pe | 0.0079274 | 2 |
| Chatham N.J | 0.0005814 | 2 |
| Columbia S.C | 0.0010180 | 2 |
| Dover | 0.0023200 | 2 |
| Dover N.H | 0.0006290 | 2 |
| Downpatrick | 0.0001966 | 2 |
| Drogheda | 0.0015034 | 2 |
| Dumfries | 0.0013927 | 2 |
| Dumfries Va | 0.0014448 | 2 |
| Elizabeth N.J | 0.0064836 | 2 |
| Florence | 0.0060072 | 2 |
| Fredicksburg Va | 0.0012732 | 2 |
| Fryeburg Me | 0.0015660 | 2 |
| Gloucester | 0.0110184 | 2 |
| Gothenburg | 0.0003956 | 2 |
| Gravesend | 0.0097328 | 2 |
| Haarlem | 0.0007248 | 2 |
| Harrisburgh Pa | 0.0028025 | 2 |
| Haverhill N.H | 0.0016245 | 2 |
| Hillsborough N.C | 0.0009450 | 2 |
| Howden | 0.0016321 | 2 |
| Kilmarnock | 0.0053922 | 2 |
| La Rochelle | 0.0028282 | 2 |
| Lausanne | 0.0056440 | 2 |
| Lincoln | 0.0000616 | 2 |
| Ludlow | 0.0019380 | 2 |
| Mechelen | 0.0064680 | 2 |
| Montego Bay | 0.0016200 | 2 |
| Montrose | 0.0280108 | 2 |
| New Bedford Ma | 0.0024402 | 2 |
| New Windsor N.Y | 0.0013646 | 2 |
| Newbury Ma | 0.0041888 | 2 |
| Newbury Vt | 0.0015660 | 2 |
| Newry | 0.0009088 | 2 |
| Northampton Ma | 0.0015352 | 2 |
| Peterborough | 0.0028158 | 2 |
| Petersburg Va | 0.0008478 | 2 |
| Poole | 0.0015808 | 2 |
| Port-au-Prince | 0.0008398 | 2 |
| Portsmouth | 0.0008100 | 2 |
| Reading Pa | 0.0031616 | 2 |
| Regensburg | 0.0017566 | 2 |
| Schenectady N.Y | 0.0021120 | 2 |
| Shepherdstown Va | 0.0049326 | 2 |
| Springfield Ma | 0.0006543 | 2 |
| St. Ives | 0.0004928 | 2 |
| Staunton Va | 0.0013044 | 2 |
| Strabane | 0.0103954 | 2 |
| Sudbury | 0.0075829 | 2 |
| Vevey | 0.0167960 | 2 |
| Vienna | 0.0121524 | 2 |
| Washington D.C | 0.0002268 | 2 |
| Wilmington N.C | 0.0010076 | 2 |
| Windsor | 0.0051870 | 2 |
| Winterthur | 0.0050635 | 2 |
| Wisbech | 0.0008742 | 2 |
| Abbeville | 0.0072556 | 1 |
| Abingdon | 0.0001235 | 1 |
| Aldermanbury | 0.0013585 | 1 |
| Alnwick | 0.0019712 | 1 |
| Altmore | 0.0001350 | 1 |
| Altona | 0.0019760 | 1 |
| Ampthill | 0.0000616 | 1 |
| Augusta Me | 0.0005700 | 1 |
| Bath N.Y | 0.0006804 | 1 |
| Beverley | 0.0015314 | 1 |
| Birstall | 0.0001425 | 1 |
| Blackburn | 0.0488700 | 1 |
| Bottisham | 0.0000616 | 1 |
| Bouillon | 0.0158728 | 1 |
| Boulogne | 0.0013832 | 1 |
| Brattleborough Vt | 0.0001296 | 1 |
| Brentford | 0.0005206 | 1 |
| Bridgeton N.J | 0.0007224 | 1 |
| Bridgnorth | 0.0001350 | 1 |
| Brookfield Ma | 0.0019475 | 1 |
| Buckden | 0.0001350 | 1 |
| Bungay | 0.0008398 | 1 |
| Burlington Vt | 0.0007830 | 1 |
| Burnley | 0.0274120 | 1 |
| Burton | 0.0001425 | 1 |
| Campbeltown | 0.0003696 | 1 |
| Cap Haitien | 0.0000000 | 1 |
| Carlow | 0.0003458 | 1 |
| Cashel | 0.0007830 | 1 |
| Charlestown Ma | 0.0002850 | 1 |
| Charlottesville Va | 0.0000000 | 1 |
| Chatham | 0.0001350 | 1 |
| Chesire Ct | 0.0000475 | 1 |
| Cincinnati Oh | 0.0006642 | 1 |
| Concord Ma | 0.0001000 | 1 |
| Danbury Ct | 0.0002850 | 1 |
| Daventry | 0.0005400 | 1 |
| Dedham Ma | 0.0063726 | 1 |
| Deptford | 0.0001350 | 1 |
| Devizes | 0.0000124 | 1 |
| Dorchester | 0.0008100 | 1 |
| Dresden | 0.0472031 | 1 |
| Dunbar | 0.0002850 | 1 |
| East Molesey | 0.0007904 | 1 |
| Easton Md | 0.0001092 | 1 |
| Edenton N.C | 0.0008100 | 1 |
| Egham | 0.0002470 | 1 |
| Elizabethtown Md | 0.0002964 | 1 |
| Ennis | 0.0007904 | 1 |
| Europe | 0.0002464 | 1 |
| Evesham | 0.0117040 | 1 |
| Frederick Md | 0.0000000 | 1 |
| Fredericton | 0.0005400 | 1 |
| Gateshead | 0.0004928 | 1 |
| Gaunt | 0.0000000 | 1 |
| Gdansk | 0.0009856 | 1 |
| Geneva N.Y | 0.0007830 | 1 |
| Ghent | 0.0126900 | 1 |
| Glocester Ma | 0.0001350 | 1 |
| Goa | 0.0001482 | 1 |
| Grantham | 0.0001425 | 1 |
| Greenfield Ma | 0.0019000 | 1 |
| Greenock | 0.0007904 | 1 |
| Grenada | 0.0033880 | 1 |
| Guernesey | 0.0006175 | 1 |
| Halle | 0.0105819 | 1 |
| Harlow | 0.0009880 | 1 |
| Haverhill Ma | 0.0009856 | 1 |
| Heidelberg | 0.0035728 | 1 |
| Horncastle | 0.0001350 | 1 |
| Houghton Park | 0.0000247 | 1 |
| Inverlochie | 0.0003696 | 1 |
| Kendal | 0.0005550 | 1 |
| Kirkcudbright | 0.0006160 | 1 |
| Knaresborough | 0.0005681 | 1 |
| Koningsberg | 0.0000616 | 1 |
| Lancaster N.J | 0.0000000 | 1 |
| Leeuwarden | 0.0032851 | 1 |
| Lewes | 0.0000000 | 1 |
| Liege | 0.0108834 | 1 |
| Lille | 0.0024640 | 1 |
| Margate | 0.0006160 | 1 |
| Marlborough | 0.0008100 | 1 |
| Martinsburg Va | 0.0001050 | 1 |
| Medford Ma | 0.0000000 | 1 |
| Minorca | 0.0001350 | 1 |
| Monmouth | 0.0012844 | 1 |
| Monmouth N.J | 0.0059280 | 1 |
| Mount Holly N.J | 0.0002850 | 1 |
| New Bedford Ms | 0.0000000 | 1 |
| Newfield Ct | 0.0007830 | 1 |
| Newport Isle Wight | 0.0084227 | 1 |
| Newton N.J | 0.0007224 | 1 |
| Niagara | 0.0001482 | 1 |
| North Shields | 0.0002850 | 1 |
| Northallerton | 0.0010868 | 1 |
| Ossining N.Y | 0.0007224 | 1 |
| Paris Ky | 0.0007656 | 1 |
| Parr-Town | 0.0001350 | 1 |
| Pembroke Ma | 0.0016200 | 1 |
| Penrith | 0.0000988 | 1 |
| Raleigh N.C | 0.0046788 | 1 |
| Reims | 0.0019760 | 1 |
| Rochdale | 0.0002850 | 1 |
| Rodborough | 0.0003952 | 1 |
| Rome | 0.0024453 | 1 |
| Roscrea | 0.0008100 | 1 |
| Saarbrucken | 0.0370366 | 1 |
| Salamanca | 0.0001350 | 1 |
| Salem N.Y | 0.0008100 | 1 |
| Salisbury N.C | 0.0008100 | 1 |
| Savanna-la-Mar Jamaica | 0.0008100 | 1 |
| Scipio N.Y | 0.0008100 | 1 |
| Shelburne Nova Scotia | 0.0002464 | 1 |
| Shiffnal | 0.0002850 | 1 |
| Siena | 0.0022325 | 1 |
| Sligo | 0.0004750 | 1 |
| South Shields | 0.0028619 | 1 |
| Spanish Town Jamaica | 0.0006804 | 1 |
| St. Albans | 0.0391500 | 1 |
| St. Eustatius | 0.0003024 | 1 |
| St. Germans | 0.0002700 | 1 |
| St. Helier | 0.0002464 | 1 |
| St. Mary’s Md | 0.0001350 | 1 |
| Stafford | 0.0002700 | 1 |
| Stockton | 0.0059136 | 1 |
| Trefeca | 0.0167082 | 1 |
| Trevoux | 0.0000950 | 1 |
| Verdun | 0.0000000 | 1 |
| Vergennes Vt | 0.0005928 | 1 |
| Walsall | 0.0010780 | 1 |
| Waltham | 0.0001350 | 1 |
| Warwick | 0.0011115 | 1 |
| West Springfield Ma | 0.0011856 | 1 |
| Wexford | 0.0006240 | 1 |
| Weymouth | 0.0011115 | 1 |
| Wigan | 0.0019594 | 1 |
| Winton | 0.0072556 | 1 |
| Wokingham | 0.0008100 | 1 |
| Wrexham | 0.0024640 | 1 |
| Yarmouth | 0.0005928 | 1 |
| Yeovil | 0.0008100 | 1 |
ggplot(df2,
aes(x = log10(1 + docs), y = log10(1 + paper))) +
geom_text(aes(label = publication.place), size = 3) +
scale_x_log10() + scale_y_log10() Scotland, Ireland, US comparison:
df2 <- df %>%
filter(!is.na(publication.country)) %>%
group_by(publication.country) %>%
summarize(paper = sum(paper.consumption.km2, na.rm = TRUE),
docs = n()) %>%
arrange(desc(docs)) %>%
filter(publication.country %in% c("Scotland", "Ireland", "USA"))p1 <- ggplot(subset(melt(df2), variable == "paper"), aes(y = value, x = publication.country)) + geom_bar(stat = "identity") + ylab("Paper consumption")
p2 <- ggplot(subset(melt(df2), variable == "docs"), aes(y = value, x = publication.country)) + geom_bar(stat = "identity") + ylab("Title count")
grid.arrange(p1, p2, nrow = 1)What can we say about the nature of the documents? Pamphlets (<32 pages) vs. Books (>120 pages) ? Book size statistics and development over time
Estimated paper consumption by document size
~80 % of statistical analysis is tidying up of the data. Too often neglected and implicitly assumed by many tools. We provide new efficient tools also for this
With open data principles, no need to reinvent the wheel for the same (or similar) datasets
Things become stable. The research tool is corrected and perfected when it is transparent & potentially used also by others
Possibilities of reuse with similar datasets is great
Automatization allows reporting with minimal human intervention
Innovative use of computational and statistical methods
New tools for old questions derived from the discipline itself
Vast amounts of useful data not being shared or utilized
Open access not enough. We need open sharing of research data and methods to study “traditional” questions
Institutions that hold the raw data are reluctant to give full access to data (even to researchers of the same institution). Why?
Research process is not opened and research data is not shared in the Humanities. Transparency, reproduction, collaboration, new initiatives are missing. Why?
Short answer: Cultural change takes time. We need concrete examples in the core field of the Humanities that actually prove OPEN DATA PRINCIPLES as useful.
Page count: distribution for documents with different sizes.
Estimated title count by document size
Top-4 places (title count), mean page count over time.